A joint translation model with integrated reordering

نویسنده

  • Nadir Durrani
چکیده

This dissertation aims at combining the benefits and to remedy the flaws of the two popular frameworks in statistical machine translation, namely Phrasebased MT and N-gram-based MT. Phrase-based MT advanced the state-of-the art towards translating phrases3 than words. By memorizing phrases, phrasal MT, is able to learn local reorderings, and handling of other local dependencies such as insertions, deletions etc. Inter-phrasal reorderings are handled through the lexicalized reordering model, which remains the state-of-the-art model for reordering in phrase-based SMT till date. However, phrase-based MT has some drawbacks: • Dependencies across phrases are not directly represented in the translation model • Discontinuous phrases cannot be represented and used • The reordering model is not designed to handle long range reorderings • Search and modeling problems require the use of a hard reordering limit • The presence of many different equivalent segmentations increases the search space 3Note that phrases in this thesis mean sequence of words, which may not be linguistic phrases

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Sequence Translation Model with Integrated Reordering

We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance r...

متن کامل

Word Reordering in Statistical Machine Translation with a POS-Based Distortion Model

In this paper we describe a word reordering strategy for statistical machine translation that reorders the source side based on Part of Speech (POS) information. Reordering rules are learned from the word aligned corpus. Reordering is integrated into the decoding process by constructing a lattice, which contains all word reorderings according to the reordering rules. Probabilities are assigned ...

متن کامل

Modeling the Translation of Predicate-Argument Structure for SMT

Predicate-argument structure contains rich semantic information of which statistical machine translation hasn’t taken full advantage. In this paper, we propose two discriminative, feature-based models to exploit predicateargument structures for statistical machine translation: 1) a predicate translation model and 2) an argument reordering model. The predicate translation model explores lexical ...

متن کامل

Dynamic distortion in a discriminative reordering model for statistical machine translation

Most phrase-based statistical machine translation systems use a so-called distortion limit to keep the size of the search space manageable. In addition, a distance-based distortion penalty is used as a feature to keep the decoder to translate monotonically unless there is sufficient support for a jump from other features, particularly the language models. To overcome the issue of setting the op...

متن کامل

Reordering by Parsing

We present a new discriminative reordering model for statistical machine translation. The model employs a standard data-driven dependency parser to predict reorderings based on syntactic information. This is made possible through the introduction of a reordering structure, which is a word alignment structure where the target word order is transposed onto the source sentence as a path. The appro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012